Skip to content

Complete Metrics System Standardization and Real-Time Fixes#16

Merged
elchinoo merged 4 commits intomainfrom
v2-redesign-core
Aug 6, 2025
Merged

Complete Metrics System Standardization and Real-Time Fixes#16
elchinoo merged 4 commits intomainfrom
v2-redesign-core

Conversation

@elchinoo
Copy link
Copy Markdown
Owner

@elchinoo elchinoo commented Aug 6, 2025

Overview

This PR completes the comprehensive review and standardization of the metrics system across both plugins, fixing the fundamental design flaws in background metrics collection.

Key Changes

Background Metrics Redesign

  • Fixed Cumulative Bug: Both plugins now calculate delta/incremental metrics instead of misleading cumulative totals
  • Real-Time Accuracy: Background metrics show actual work done in each 1-second interval, not running totals since test start
  • TPC-C Plugin: Added delta tracking (prevTotalTxns, prevSaveTime, prevMetricsMu) with proper initialization/reset
  • Bulk-Load Plugin: Already fixed in previous commits, now fully consistent

Metrics Consistency

  • Standardized Naming: Background metrics use 'interval_*' prefix (interval_transaction_rate, interval_tps, etc.)
  • Meaningful Tags: Added interval_transactions, interval_seconds for proper rate calculation analysis
  • Time Windows: All metrics now use proper start_time/end_time representing actual measurement intervals

Tag Structure Cleanup

  • Removed Null Fields: Eliminated batch_size=null, total_rows=null, batch_count=null from inappropriate contexts
  • Plugin-Specific Tags: TPC-C uses scale_factor, bulk-load uses batch_size - no forced standardization
  • Cleaner Analysis: Simplified tag structures make querying and visualization much easier

Database Schema Compatibility

Problem Solved

Before: Background metrics showed meaningless cumulative totals:

After: Real-time interval metrics:

Testing

  • ✅ Both plugins compile successfully
  • ✅ Main StormDB binary builds without errors
  • ✅ No linting errors or compilation issues
  • ✅ Backward compatible with existing test configurations

Impact

  • Real-Time Monitoring: Proper per-second metrics for live dashboard monitoring
  • Accurate Analysis: True performance trends instead of misleading cumulative data
  • Better UX: Clean, focused tag structures for easier querying and visualization
  • Foundation: Solid metrics foundation for future enhancements

Ready for production testing and integration --head v2-redesign-core --base main

…cumulative

- Replace cumulative metrics collection with real-time per-batch metrics
- Add currentBatch tracking to monitor which batch size is actively being tested
- Background metrics now save current transaction/row rates for the active batch
- Remove confusing cumulative metrics across different batch sizes
- Metrics now show what's happening 'right now' for better real-time analysis
- Add batch size to all metric tags for proper categorization
- Reset current batch tracking when test completes

This fixes the logical flaw where cumulative metrics across different batch
sizes made no sense for real-time monitoring and analysis.
- Clean up tags by removing null batch_count entries
- batch_count doesn't make sense for real-time current batch metrics
- Simplifies tag structure and removes confusing null values
…mulative

- Add previous metrics tracking (prevTransactions, prevRows, prevSaveTime)
- Calculate delta values between saves for true interval metrics
- Change metric_type to 'interval_*' to reflect actual behavior
- Add interval_transactions, interval_rows, interval_seconds to tags
- Initialize and reset previous metrics at batch start/end
- Skip first iteration until we have baseline previous values

Now background metrics show actual work done in each 1-second interval
instead of cumulative totals since test start. This provides proper
real-time monitoring with meaningful rate calculations.
Background Metrics Improvements:
- Fix TPC-C plugin to use delta calculation instead of cumulative metrics
- Add previous metrics tracking (prevTotalTxns, prevSaveTime) to TPC-C
- Update TPC-C to save 'interval_*' metrics with proper time window calculations
- Initialize/reset previous metrics at connection level start in TPC-C

Final Results Cleanup:
- Remove unnecessary null fields (batch_size, total_rows, batch_count) from TPC-C final results
- Remove batch_count field from bulk-load final results
- Simplify tag structures to only include relevant fields per plugin type

Consistency Improvements:
- Standardize metric_type naming: 'interval_*' for background, specific names for final
- Ensure all background metrics calculate rates based on actual work done in time interval
- Clean tag structures across both plugins for better analysis and filtering

Both plugins now provide true real-time monitoring with meaningful
interval-based metrics instead of misleading cumulative calculations.
@elchinoo elchinoo merged commit 53aa7ba into main Aug 6, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant